Search | Global Index Medicus

Construction of chemical information database based on optical structure recognition technique / 北京大学学报(医学版)

Chuan-Yu LV; Ming-Na LI; Liang-Ren ZHANG; Zhen-Ming LIU; Chuan-Yu LV; Ming-Na LI; Liang-Ren ZHANG; Zhen-Ming LIU.

Journal of Peking University(Health Sciences) ; (6): 352-357, 2018.

Article in Chinese | WPRIM | ID: wpr-691507

ABSTRACT

OBJECTIVE@#To create a protocol that could be used to construct chemical information database from scientific literature quickly and automatically.@*METHODS@#Scientific literature, patents and technical reports from different chemical disciplines were collected and stored in PDF format as fundamental datasets. Chemical structures were transformed from published documents and images to machine-readable data by using the name conversion technology and optical structure recognition tool CLiDE. In the process of molecular structure information extraction, Markush structures were enumerated into well-defined monomer molecules by means of QueryTools in molecule editor ChemDraw. Document management software EndNote X8 was applied to acquire bibliographical references involving title, author, journal and year of publication. Text mining toolkit ChemDataExtractor was adopted to retrieve information that could be used to populate structured chemical database from figures, tables, and textual paragraphs. After this step, detailed manual revision and annotation were conducted in order to ensure the accuracy and completeness of the data. In addition to the literature data, computing simulation platform Pipeline Pilot 7.5 was utilized to calculate the physical and chemical properties and predict molecular attributes. Furthermore, open database ChEMBL was linked to fetch known bioactivities, such as indications and targets. After information extraction and data expansion, five separate metadata files were generated, including molecular structure data file, molecular information, bibliographical references, predictable attributes and known bioactivities. Canonical simplified molecular input line entry specification as primary key, metadata files were associated through common key nodes including molecular number and PDF number to construct an integrated chemical information database.@*RESULTS@#A reasonable construction protocol of chemical information database was created successfully. A total of 174 research articles and 25 reviews published in Marine Drugs from January 2015 to June 2016 collected as essential data source, and an elementary marine natural product database named PKU-MNPD was built in accordance with this protocol, which contained 3 262 molecules and 19 821 records.@*CONCLUSION@#This data aggregation protocol is of great help for the chemical information database construction in accuracy, comprehensiveness and efficiency based on original documents. The structured chemical information database can facilitate the access to medical intelligence and accelerate the transformation of scientific research achievements.

Subject(s)

Data Mining , Databases, Chemical , Molecular Structure , Software

Exploiture and application of an internet-based Computation Platform for Integrative Pharmacology of Traditional Chinese Medicine / 中国中药杂志

Hai-Yu XU; Zhen-Ming LIU; Yan FU; Yan-Qiong ZHANG; Jian-Jun YU; Fei-Fei GUO; Shi-Huan TANG; Chuan-Yu LV; Jin SU; Ru-Yi CUI; Hong-Jun YANG.

China Journal of Chinese Materia Medica ; (24): 3633-3638, 2017.

Article in Chinese | WPRIM | ID: wpr-335808

ABSTRACT

Recently, integrative pharmacology(IP) has become a pivotal paradigm for the modernization of traditional Chinese medicines(TCM) and combinatorial drugs discovery, which is an interdisciplinary science for establishing the in vitro and in vivo correlation between absorption, distribution, metabolism, and excretion/pharmacokinetic(ADME/PK) profiles of TCM and the molecular networks of disease by the integration of the knowledge of multi-disciplinary and multi-stages. In the present study, an internet-based Computation Platform for IP of TCM(TCM-IP, www.tcmip.cn) is established to promote the development of the emerging discipline. Among them, a big data of TCM is an important resource for TCM-IP including Chinese Medicine Formula Database, Chinese Medical Herbs Database, Chemical Database of Chinese Medicine, Target Database for Disease and Symptoms, et al. Meanwhile, some data mining and bioinformatics approaches are critical technology for TCM-IP including the identification of the TCM constituents, ADME prediction, target prediction for the TCM constituents, network construction and analysis, et al. Furthermore, network beautification and individuation design are employed to meet the consumer's requirement. We firmly believe that TCM-IP is a very useful tool for the identification of active constituents of TCM and their involving potential molecular mechanism for therapeutics, which would wildly applied in quality evaluation, clinical repositioning, scientific discovery based on original thinking, prescription compatibility and new drug of TCM, et al.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL